fix(spinlock): bound OptionalSharedLockGuard spin in signal handler paths#572
fix(spinlock): bound OptionalSharedLockGuard spin in signal handler paths#572jbachorik wants to merge 3 commits into
Conversation
Replace OptionalSharedLockGuard's unbounded tryLockShared spin with a CAS-retry budget (DEFAULT_SHARED_SPIN_BUDGET=256). Under heavy reader contention the guard now returns false after exhausting the budget rather than spinning indefinitely inside a signal handler. Adds debug-mode assertions to unlock()/unlockShared(), a new JVMTI_STACKS_DROPPED_REC_LOCK observability counter, and a gtest suite for the new behaviour. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
CI Test ResultsRun: #26804864801 | Commit:
Status Overview
Legend: ✅ passed | ❌ failed | ⚪ skipped | 🚫 cancelled Summary: Total: 32 | Passed: 32 | Failed: 0 Updated: 2026-06-02 07:38:14 UTC |
Fixes TSAN data race: plain volatile reads in assert() raced with concurrent atomic CAS writes from other threads. Use __atomic_load_n (RELAXED) for assert invariant checks and __atomic_store_n for reset(). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f3ad4d1b91
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Adds SAMPLES_DROPPED_REC_LOCK counter and increments it in the recordEvent() else branch, mirroring what recordEventDelegated() already does for JVMTI_STACKS_DROPPED_REC_LOCK. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
kaahos
left a comment
There was a problem hiding this comment.
It looks good to me! I've just suggested a small non-blocking modification in profiler.cpp:1485.
| rec->addThread(lock_index, tid); | ||
| } | ||
| } else { | ||
| Counters::increment(JVMTI_STACKS_DROPPED_REC_LOCK); |
There was a problem hiding this comment.
nit: in profiler.cpp:1485, I think that JVMTI_STACKS_DROPPED_LOCK and JVMTI_STACKS_DROPPED_REC_LOCK should be summed up for the error message to convey relevant info.
What does this PR do?:
OptionalSharedLockGuardpreviously used an unboundedtryLockSharedspin loop that could stall indefinitely inside signal handlers under heavy reader CAS contention. This PR replaces the unbounded variant with a bounded one (default budget: 256 CAS-retry iterations) and makes it the sole implementation.Motivation:
Signal handlers must return promptly. Under heavy concurrent sampling (many threads recording events simultaneously), CAS contention on
_rec_lockcould cause signal handlers reachingFlightRecorder::recordEvent*to spin for an unbounded number of iterations. When the budget is exhausted, the sample is dropped and a newJVMTI_STACKS_DROPPED_REC_LOCKcounter is incremented for observability.Additional Notes:
OptionalSharedLockGuardnow takes an optionalmax_spinsparameter (defaultSpinLock::DEFAULT_SHARED_SPIN_BUDGET = 256). The budget bounds CAS-retry iterations, not wall-clock latency.unlock()andunlockShared()gain debug-modeasserts to catch mismatched-variant misuse as the class grows.OptionalSharedLockGuardusages inflightRecorder.cpppick up the bounded behaviour automatically — no call-site changes needed.BoundedOptionalSharedLockGuardwas an intermediate name used during development; it is not present in this PR.How to test the change?:
New gtest suite:
ddprof-lib/src/test/cpp/spinLock_ut.cppOptionalGuard_UncontendedAcquire— acquires and releases correctlyOptionalGuard_ExclusiveHeld_ImmediateReturn— exclusive lock causes immediate false return (no spin)OptionalGuard_SpinBudgetBound— tiny budget under reader CAS contention; guard returns without hangingOptionalGuard_BudgetEnforced_ExclusivePath— deterministic: exclusive lock always returns false for any budgetOptionalGuard_SharedReentrancy— multiple shared guards coexist; exclusive try failsTryLockShared_ExclusiveHeld_ReturnsFalse/TryLockShared_Free_ReturnsTrue— unbounded overload sanityRun with:
For Datadog employees: